Overview

  • Network regression (continuous outcomes)
  • Autologistic Actor-Attribute Model, ALAAM (binary outcomes)

Network regression

Network packages

We will use sna (Butts 2016) and network (Butts 2015)

require(sna)
## Loading required package: sna
## Loading required package: statnet.common
## 
## Attaching package: 'statnet.common'
## The following objects are masked from 'package:base':
## 
##     attr, order
## Loading required package: network
## 
## 'network' 1.18.2 (2023-12-04), part of the Statnet Project
## * 'news(package="network")' for changes since last version
## * 'citation("network")' for citation information
## * 'https://statnet.org' for help, support, and other information
## sna: Tools for Social Network Analysis
## Version 2.7-2 created on 2023-12-05.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.
require(network)

Load data

We are looking at the s50 dataset, which is further described here: https://www.stats.ox.ac.uk/~snijders/siena/s50_data.htm

This dataset is available in ziped format online.

temp <- tempfile()
download.file("https://www.stats.ox.ac.uk/~snijders/siena/s50_data.zip",temp)
X <- as.matrix( read.table(unz(temp, "s50-network1.dat")) )
sport <- read.table(unz(temp, "s50-sport.dat"))
smoke <- read.table(unz(temp, "s50-smoke.dat"))
alcohol <- read.table(unz(temp, "s50-alcohol.dat"))
unlink(temp)
n <- nrow(X)
smoke <- (smoke[,2] %in% c(2,3))+0 # use wave 2 and set values of 2 and 3 to smoking and 1 to non-smoking
sport <- sport[,1]-1# dichotomise sporty

Basic descriptives

There are a lot of network metrics implemented in `sna’ (check help file). We will only look at a selection.

Network diagram

Most important is to plot the network and get a feeling for it

gplot(X)

Degree distributions

par(mfrow=c(1,2))# set up two panels
plot(table( degree( X , cmode = "indegree")),main='Indegrees')
plot(table( degree( X , cmode = "outdegree")),main='Outdegrees')

Reciprocity

dyad.census(X)
##      Mut Asym Null
## [1,]  39   35 1151

Clustering coefficient

gtrans(X)
## [1] 0.3873874

Triad census

triad.census(X)
##        003  012  102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U 120C 210
## [1,] 16243 1470 1724    5   18   21   42   30    5    0  15    6    5    2   9
##      300
## [1,]   5

Overall task

The variable Alcohol use is coded as

Alcohol value meaning
1 non
2 once or twice a year
3 once a month
4 once a week
5 more than once a week

We are now going to investigate if your friends alcohol use tends to influence your alcohol use. Start with plotting the networks with

  • Node size proportional to alcohol use
  • Colour by smoker (2-3) and non-smoker (1)
gplot(X,vertex.cex=alcohol[,2]/2, 
      vertex.col = ifelse(smoke==1,'red','green') )

Looking at the figure, do you see any evidence of social influence?

Big nodes seem to be tied to other big nodes and small nodes to other small nodes. Smokers also seem to hang out with other smokers.

Ordinary least squares (OLS) regression

Make the assumption that the levels of smoking can be treated as a interval-level, continuous variable. To model the outcomes, we start with assuming that outcomes are independent across students using a regression model

\[ Y_i = \beta_0+\beta_1 m_{i,smoke} + \beta_2 m_{i,sport}+\epsilon_i \]

where

  • \(\beta_0\) is the intercept,
  • \(\beta_1\) is the average difference in alcohol use for smokers relative to non-smokers,
  • \(\beta_2\) is the average difference in alcohol use for sporty people relative to non-sporty people, and
  • the \(\epsilon_i\)’s are assumed to be independent across \(i\) and follow a normal distribution \(N(0,\sigma^2)\).

Fit OLS

y <- alcohol[,2]
ans.ols <- lm(y ~ smoke+sport)
summary(ans.ols)
## 
## Call:
## lm(formula = y ~ smoke + sport)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0922 -0.6918  0.1690  0.8232  2.5694 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.4306     0.3169   7.669 7.98e-10 ***
## smoke         1.4004     0.3084   4.541 3.90e-05 ***
## sport         0.2612     0.3331   0.784    0.437    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.021 on 47 degrees of freedom
## Multiple R-squared:  0.305,  Adjusted R-squared:  0.2755 
## F-statistic: 10.31 on 2 and 47 DF,  p-value: 0.0001933

Question

What conclusions can you draw from the ANOVA table of the OLS regression regarding the regressors smoke and sport?

Smokers have on average a 1.4 higher score on drinking and the coefficient is significantly different from 0 at the 1%-level (and more). There is no evidence of sporty people drinking more or less than less sporty people as the coefficient is not significantly different from 0.

Residuals

We assumed that the errors were normally distributed so the residuals

\[ \hat{e}_i= y_i - \hat{y}= y_i - \hat{\beta}M_i^{\top} \]

should be normally distributed:

hist(ans.ols$residuals)

The residuals seem reasonably ‘normal’

Autocorrelation

We are now going to investigate other properties of the model.

For the sna package, we need to put all covariates, including the constant, in a matrix

\[ \mathbf{M} = \begin{pmatrix} M_{1}\\ M_{2}\\ \vdots\\ M_{n}\\ \end{pmatrix}= \begin{pmatrix} 1 & m_{1,1} & \cdots & m_{Z,p}\\ 1 & m_{2,1} & \cdots & m_{Z,p}\\ \vdots & \vdots & \vdots& \vdots\\ 1 & m_{n,1} & \cdots & m_{n,p}\\ \end{pmatrix} \]

We thus put all covariates into the same matrix:

M <- cbind(matrix(1,n,1),smoke,sport)
colnames(M) <- c('cons','smoke','sport')
head(M)
##      cons smoke sport
## [1,]    1     0     1
## [2,]    1     1     0
## [3,]    1     0     0
## [4,]    1     0     1
## [5,]    1     0     1
## [6,]    1     0     1

Let us investigate whether the residuals are independent across the network. In particular, if the outcome of \(i\) is completely independent of the outcome of \(j\) then we would not expect there to be a difference between \((\hat{e}_i - \bar{e})(\hat{e}_j-\bar{e})\) for a pair that is connected through a ties \(x_{ij}=1\) and a pair \((\hat{e}_i - \bar{e})(\hat{e}_k-\bar{e})\) that is not connected, \(x_{ik}=0\). In other words, higher (lower) than predicted values should not be expected if a network partner has higher (lower) than predicted value. Note that the average \(\bar{e}=0\) by construction and was just included for clarity. Moran’s I does measure exactly whether connected partners tend to have more similar residuals than unconnected people. In terms of the outcome variable

\[ I_k =\frac{n}{\sum_{i=1}^n\sum_{j=1}^n X_{ij}^{(k)}} \frac{\sum_{i=1}^n \sum_{j=1}^n (y_i-\bar{y}) (y_j-\bar{y})X_{ij}^{(k)} }{\sum_{j=1}^n (y_j-\bar{y})^2} \]

Where \(X_{ij}^{(k)}\) is the \(k\)-step adjacency matrix, i.e. \(X_{ij}^{(1)}\) is the matrix of people that are directly connected, \(X_{ij}^{(2)}\) is the matrix of people that are connected at distance 2, etc. The step \(k\) is sometimes called lag. Intuitively, if \(X_{ij}^{(k)}\) doesn’t matter, then many cross-product terms should cancel others out. It can be shown that, under the assumption that there is no network correlation at lag \(k=1\), then the expected value is

\[ E(I_1) = \frac{-1}{N-1} \]

Here we want to check if this seems to hold true for our residuals - are they uncorrelated across network partners?

nacf(X,ans.ols$residuals,type="moran")
##           0           1           2           3           4           5 
##  1.00000000  0.19495562 -0.17048356  0.02784058 -0.04684902 -0.01678792 
##           6           7           8           9          10          11 
##  0.03002702 -0.06753150 -0.10455140  0.00000000  0.00000000  0.00000000 
##          12          13          14          15          16          17 
##  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000 
##          18          19          20          21          22          23 
##  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000 
##          24          25          26          27          28          29 
##  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000 
##          30          31          32          33          34          35 
##  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000 
##          36          37          38          39          40          41 
##  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000 
##          42          43          44          45          46          47 
##  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000 
##          48          49 
##  0.00000000  0.00000000
# nacf(X,ans.ols$residuals,type="geary") # Geary's C is another measure of autocorrelation but is harder to undertand

That \(I_0=1\) is because this is the correlation of residuals with themselves!

We are only really interested in shorter lags, so let us plot the first 4 and add a reference line for \(E(I_1)\) under no correlation

plot(nacf(X,ans.ols$residuals,type="moran")[1:4],type='b')
abline(h=-1/(n-1),col='red')

Row-nomalise

When modelling social influence we may want scale the network ties to take into account how many ties people have. Leenders (2002) (Leenders 2002) propose a number of ways in which we can scale the adjacency matrix. Here we create a weight matrix

\[ \mathbf{W} = \begin{bmatrix} W_{11} & W_{12} & \cdots & W_{1n}\\ W_{21} & W_{22} & \cdots & W_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ W_{n1} & W_{n2} & \cdots & W_{nn}\\ \end{bmatrix} \]

We want each arc to have the weight

\[ W_{ij} = X_{ij}/d_{i,out} \]

We can do this easily BUT we have to be careful so that we do not divide by 0 (we define \(0/0\) as \(0\))

degrees <- degree(X,outdegree)
W <- X
W[degrees>0,] <- X[degrees>0,]/degrees[degrees>0]

# You can check that we are actually normalising correctly
# Wcopy <- X
# for (i in c(1:n))
# {
#  if (degrees[i]>0)
#  {
#    Wcopy[i,] <- X[i,]/degrees[i]
#  }
#  
#}
#sum( W != Wcopy)

Check residulas again

Check the residuals again

plot(nacf(W,ans.ols$residuals,type="moran")[1:4],type='b')

Do we see any difference? In the formula for \(I_k\) the scaling factors \(d_i\) will actually cancel out.

Network autocorrelation model

The network autocorrelation model, or network disturbances model (Marsden and Friedkin 1993), looks exactly like the OLS \[ y_i = M_i \beta + \epsilon_i\text{,} \]

but we no longer assume that the residuals are independent. Instead, we induce network autoocorrelation on the error terms

\[ \epsilon_i = \rho \sum_{j=1}^n W_{ij}\epsilon_j+\xi_i \]

(\(\xi\) is the Greek letter Xi - https://en.wikipedia.org/wiki/Xi_(letter)). The error tems \(\xi_i\) are assumed to be independent and identically distributed \(\xi_i \thicksim N(0,\sigma^2)\). The interpretation is that if \(i\) has a higher than predicted value on the outcome variable then \(j\) is more likely to also have higher than predicted values for all \(j\) that \(i\) has nominated.

If you know a bit of matrix algebra, you will notice that we can write the vector of disturbances \(\epsilon =(\epsilon_1,\ldots,\epsilon_n)^{\top}\) as

\[ \epsilon = \rho \mathbf{W} \epsilon + \xi \]

One immediate issue here is that we have \(\epsilon\) on both sides of the equation. We can simplify this expression by solving for \(\epsilon\)

\[ \epsilon = (I-\rho \mathbf{W})^{-1}\xi \]

You can interpret this as the error terms \(\xi_i\) ‘spreading’ on the network.

Fit network autocorrelation model

The function lnam (which stands for linear network autocorrelatio models, pressumably) can fit this model. The formula is specified in terms of the outcome y variable and the matrix x of covariates. The weight matrix is specified as W2.

netacm.1 <-lnam(y=y, x=M, W2=W)
## Loading required namespace: numDeriv
summary(netacm.1)
## 
## Call:
## lnam(y = y, x = M, W2 = W)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7914 -0.7067  0.2933  1.0298  2.4933 
## 
## Coefficients:
##        Estimate Std. Error Z value Pr(>|z|)    
## cons     2.5067     0.3157   7.939    2e-15 ***
## smoke    1.0847     0.3299   3.288  0.00101 ** 
## sport    0.2000     0.3026   0.661  0.50869    
## rho2.1   0.6492     0.2995   2.168  0.03017 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##       Estimate Std. Error
## Sigma   0.9347      0.009
## 
## Goodness-of-Fit:
##  Residual standard error: 1.047 on 46 degrees of freedom (w/o Sigma)
##  Multiple R-Squared: 0.2039, Adjusted R-Squared: 0.1346
##  Model log likelihood: -68.48 on 45 degrees of freedom (w/Sigma)
##  AIC: 147 BIC: 156.5 
## 
##  Null model: meanstd 
##  Null log likelihood: -79.54 on 48 degrees of freedom
##  AIC: 163.1 BIC: 166.9 
##  AIC difference (model versus null): 16.11 
##  Heuristic Log Bayes Factor (model versus null):  10.38

Did the conclusions about the regressors smoke and sport change?

The difference between smokers and non-smokers remain but the effect is somewhat smaller.

Is there evidence of network autocorrelation?

The network autocorrelation (must be) \(\rho\) is here rho2.1 and is estimated to 0.16 and is statistically significantly different from zero on the 5%-level using a one-sided test.

Looking at the network plot, it almost looks as if high-degree nodes drink more. To take this into account, we add outdegree as an explanatory variable:

M <- cbind(matrix(1,n,1),degrees,smoke,sport )
colnames(M) <- c('cons','degree','smoke','sport')
head(M)
##      cons degree smoke sport
## [1,]    1      3     0     1
## [2,]    1      4     1     0
## [3,]    1      4     0     0
## [4,]    1      4     0     1
## [5,]    1      2     0     1
## [6,]    1      2     0     1

Now re-run the network autocorrlation model (call the output netacm.2 to align with the code in the code chunk below)

netacm.2 <-lnam(y=y, x=M, W2=W)
summary(netacm.2)
## 
## Call:
## lnam(y = y, x = M, W2 = W)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.91118 -0.71108  0.09031  0.60982  2.34177 
## 
## Coefficients:
##        Estimate Std. Error Z value Pr(>|z|)    
## cons    2.12466    0.39409   5.391    7e-08 ***
## degree  0.08893    0.05627   1.580 0.113994    
## smoke   1.07210    0.31516   3.402 0.000669 ***
## sport   0.25294    0.29605   0.854 0.392900    
## rho2.1  0.66227    0.29190   2.269 0.023278 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##       Estimate Std. Error
## Sigma   0.9115      0.009
## 
## Goodness-of-Fit:
##  Residual standard error: 1.032 on 45 degrees of freedom (w/o Sigma)
##  Multiple R-Squared: 0.2479, Adjusted R-Squared: 0.1643
##  Model log likelihood: -67.27 on 44 degrees of freedom (w/Sigma)
##  AIC: 146.5 BIC: 158 
## 
##  Null model: meanstd 
##  Null log likelihood: -79.54 on 48 degrees of freedom
##  AIC: 163.1 BIC: 166.9 
##  AIC difference (model versus null): 16.55 
##  Heuristic Log Bayes Factor (model versus null):  8.902

Does accounting for outdegree change the conclusions about network autocorrelation?

No, the coefficient for degree itselft is not statistically significantly different from 0. The magnitude of the network autocorrelation paramter is increased substantially.

Network Effects model

The network effects model assumes that there is correlation through the outcome variable \(\mathbf{Y}\). This subtly different from correlation only through the error term. The outcome \(Y_i\) for \(i\) is a a weighted sum of the values \(Y_j\) of the contacts of \(i\)

\[ Y_i = \rho \sum_{j=1}^n W_{ij}Y_j + M_i\beta+\epsilon_i \]

where \(\xi_i \thicksim N(0,\sigma^2)\) independently for all \(i\).

If you want to model social influence or social contagion, the network effects model is more approprate than the autocorrelation model as the former models the dependence between values on the outcome varaible

The equation for the full outcome vector can, as before, be written compactly as

\[ \mathbf{Y} = \rho \mathbf{W} \mathbf{Y} + \mathbf{M}\beta+\epsilon \]

Here it is there is feedback on the outcome variable.

Stationary distribution

We can derive this model out of a longitudinal model

\[ Y_{i,t} = \rho \sum_{j=1}^n W_{ij}Y_{j,t-1}+\mathbf{M}\beta+\epsilon_{i,t} \]

as \(t\) tends to infinity.

For a quick illustration, let us look at the first 5 interaction of the updating accourding to the longitudinal form of the model

rho <- 0.4# set a strong influence paramter
beta <- matrix(summary(netacm.2)$beta[,1],4,1)# use the estimated coefficients from the last regression
Y <- matrix(rnorm(n),n,1)# random starting point
par(mfrow=c(2,3), oma = c(0,1,0,0) + 0.1,mar = c(1,0,1,1) + 0.1)
coord <- gplot(X,gmode='graph',vertex.cex = Y/2,main= paste('M ',round(nacf(W,Y[,1],type="moran")[2],3) ) )
for (k in c(1:5)){
Y <- rho * W %*% Y + M %*% beta + matrix(rnorm(n),n,1)# update according to equation; %*% for matrix multiplication
gplot(X,gmode='graph',vertex.cex = Y/2, coord=coord, main= paste('M ',round(nacf(W,Y[,1],type="moran")[2],3) ) )
}

We may or may not see increased correlation in the outcomes. Running 100 iterations this will become a little more evident

moran <- matrix(0,100,3)
rho <- .4
Y <- matrix(rnorm(n),n,1)
for (k in c(1:100)){
Y <- rho * W %*% Y + M %*% beta + matrix(rnorm(n),n,1)
moran[k,] <- nacf(W,Y[,1],type="moran")[2:4]}
par(mfrow=c(1,1))
plot(moran[,1],type='l')
lines(moran[,2],col='red')
lines(moran[,3],col='blue')
abline( h = -1/(n-1), col='grey')

We might see a slight trend upwards and correlations are generally speaking positive.

As before \(Y\) on both sides of the equation, but we can solve for \(Y\)

\[ \mathbf{Y} = (I- \rho \mathbf{W})^{-1}(\mathbf{M}\beta+\epsilon) \]

The same function as before, lnam is used to fit this model. The difference from the network autocorrelation model is that the weight matrix is specified as W1 not W2.

neteff.2 <-lnam(y=y, x=M, W1=W)
summary(neteff.2)
## 
## Call:
## lnam(y = y, x = M, W1 = W)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.04530 -0.70707  0.08996  0.74132  2.41599 
## 
## Coefficients:
##        Estimate Std. Error Z value Pr(>|z|)    
## cons    1.78401    0.42224   4.225 2.39e-05 ***
## degree  0.07223    0.05714   1.264 0.206154    
## smoke   1.14399    0.31259   3.660 0.000253 ***
## sport   0.19588    0.31403   0.624 0.532785    
## rho1.1  0.29178    0.17255   1.691 0.090838 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##       Estimate Std. Error
## Sigma   0.9404      0.009
## 
## Goodness-of-Fit:
##  Residual standard error: 1.025 on 45 degrees of freedom (w/o Sigma)
##  Multiple R-Squared: 0.3229, Adjusted R-Squared: 0.2477
##  Model log likelihood: -68.05 on 44 degrees of freedom (w/Sigma)
##  AIC: 148.1 BIC: 159.6 
## 
##  Null model: meanstd 
##  Null log likelihood: -79.54 on 48 degrees of freedom
##  AIC: 163.1 BIC: 166.9 
##  AIC difference (model versus null): 14.98 
##  Heuristic Log Bayes Factor (model versus null):  7.336

There is no evidence for social influence on alcohol

Simplify model

Fit a regression that only includes (the intercept) smoke and that uses the original, non-normalised, adjacency matrix

neteff.2 <-lnam(y=y, x=M[,c(1,3)], W1=X)
summary(neteff.2)
## 
## Call:
## lnam(y = y, x = M[, c(1, 3)], W1 = X)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.91250 -0.60102 -0.06224  0.75113  2.24134 
## 
## Coefficients:
##        Estimate Std. Error Z value Pr(>|z|)    
## cons    2.21840    0.25578   8.673  < 2e-16 ***
## smoke   1.10413    0.30849   3.579 0.000345 ***
## rho1.1  0.06896    0.03219   2.142 0.032153 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##       Estimate Std. Error
## Sigma   0.9503      0.009
## 
## Goodness-of-Fit:
##  Residual standard error: 1.013 on 47 degrees of freedom (w/o Sigma)
##  Multiple R-Squared: 0.3161, Adjusted R-Squared: 0.2724
##  Model log likelihood: -68.59 on 46 degrees of freedom (w/Sigma)
##  AIC: 145.2 BIC: 152.8 
## 
##  Null model: meanstd 
##  Null log likelihood: -79.54 on 48 degrees of freedom
##  AIC: 163.1 BIC: 166.9 
##  AIC difference (model versus null): 17.9 
##  Heuristic Log Bayes Factor (model versus null):  14.07

Question

What do you conclude from the results? Can you relate this to total and average exposure in the diffusion model?

When the the matrix is not row-normlised there is evidence of social influence. This means that it is the number of people that exert influence on you and not the proportion of friends.

ALAAM

This tutorial introduces aspects of the Bayesian estimation for auto-logistic actor attribute models (ALAAMs)(Robins, Pattison, and Elliott (2001), Daraganova (2009), and Daraganova and Robins (2013)) developed in Koskinen and Daraganova (2022).

Preamble for ALAAM

The main functions are defined in `balaam’ which can be “loaded” from GitHub.

Load program

source("https://raw.githubusercontent.com/johankoskinen/ALAAM/main/balaam.R")

Manual

A (proto-) manual is avaialble on GitHub alaam_effects.

Dependencies

There are a number of dependencies in the functions but RStudio should prompt you to install of the following if you have not.

require(MASS)
## Loading required package: MASS
require('mvtnorm')
## Loading required package: mvtnorm
require('coda')
## Loading required package: coda

Network packages

In particular, we will use sna (Butts 2016) and network (Butts 2015)

require(sna)
require(network)

Load data

We are looking at the s50 dataset, which is further described here: https://www.stats.ox.ac.uk/~snijders/siena/s50_data.htm

This dataset is available in ziped format online.

temp <- tempfile()
download.file("https://www.stats.ox.ac.uk/~snijders/siena/s50_data.zip",temp)
adj <- as.matrix( read.table(unz(temp, "s50-network1.dat")) )
sport <- read.table(unz(temp, "s50-sport.dat"))
smoke <- read.table(unz(temp, "s50-smoke.dat"))
alcohol <- read.table(unz(temp, "s50-alcohol.dat"))
unlink(temp)

Format data

n <- nrow(adj)
adj <- as.matrix(adj) # convert from data.frame to matrix
smoke <- smoke[,2] # use wave 2
smoke[smoke<2] <- 0 # set non-smoker to 0
smoke[smoke>0] <- 1 # set occasional and regular to 1
sport <- sport-1

Format for estimate.alaam()

The main estimation function estimate.alaam requires data to be in data.frame format (check alaam_effects.pdf)

my.data <- data.frame(smoke=smoke, alcohol=alcohol[,1],sport=sport[,1])

Check data

head(my.data)
##   smoke alcohol sport
## 1     0       3     1
## 2     1       2     0
## 3     0       2     0
## 4     0       2     1
## 5     0       3     1
## 6     0       4     1

Descriptives

We might want to check what the association are like, for example with alcohol

table(my.data$smoke,my.data$alcohol)
##    
##      1  2  3  4  5
##   0  4 13 10  5  1
##   1  1  3  2  9  2

and with outdegree

boxplot(degree( adj , cmode = "outdegree") ~ my.data$smoke)

Autologistic actor-attribute model

The social influence model developed by Robins, Pattison, and Elliott (2001) and later elaborated by Daraganova (2009) and Daraganova and Robins (2013) and now refered to as the autologistic actor-attribute model (ALAAM), is a model for binary nodal attributes \(\boldsymbol{y}= \{Y_i:1 \leq i \leq n \}\), conditional on a network adjacency matrix \(\mathbf{X} = \{ X_{ij}: (i,j)\in V \times V \}\). \[ p_{\boldsymbol{\theta}}(\boldsymbol{y} | \mathbf{X} ) = \exp \left\{ \boldsymbol{\theta}^{\top} \boldsymbol{z}(\boldsymbol{y},\mathbf{X}) - \psi(\boldsymbol{\theta}; \mathbf{X}) \right\} \] Here \(\boldsymbol{z}(\boldsymbol{y},\mathbf{X})\) is a \(p\times 1\) vector of statistics calculated for the the dependent variable \(y\) and the network \(x\).

This part of the tutorial takes you through the Bayesian inference scheme of Koskinen and Daraganova (2022).

Target of inference

The aim of the MCMC of Koskinen and Daraganova (2022), is to draw samples from and thereby approximate the posterior distribution \[ \pi(\boldsymbol{\theta} | \boldsymbol{y},\mathbf{X}) \propto p_{\boldsymbol{\theta}}(\boldsymbol{y} | \mathbf{X} ) \pi(\boldsymbol{\theta}) = \exp \left\{ \boldsymbol{\theta}^{\top} \boldsymbol{z}(\boldsymbol{y},\mathbf{X}) - \psi(\boldsymbol{\theta}; \mathbf{X}) \right\} \pi(\boldsymbol{\theta} ){\text{,}} \] where \(\pi(\boldsymbol{\theta} )\) is the prior distribution of the parameters. We need to use MCMC because the normalising constant of \(\pi(\boldsymbol{\theta} | \boldsymbol{y},\mathbf{X})\) is not analytically tractable (nor is the normalising constant of the model, $(; ) $).

Specifying a model

Like for the function lm, and glm, the intuition of the formula is that the LHS is some transformation of the RHS. For ALAAM, we can think of the model specification in terms of the conditional logits

\[ {\rm{logit}}\left[ \Pr(Y_i = 1 \mid \boldsymbol{y}_{-i},\boldsymbol{\theta}) \right]=\boldsymbol{\theta}^{\top}\boldsymbol{\omega}_i(\boldsymbol{y},\mathbf{X})=\theta_1\omega_{1i}(\boldsymbol{y},\mathbf{X})+\cdots +\theta_p\omega_{pi}(\boldsymbol{y},\mathbf{X}) \] for the change statistics

\[ \omega_{ji}(\boldsymbol{y},\mathbf{X})=z_j(\Delta_i^+\boldsymbol{y},\mathbf{X})-z_j(\Delta_i^-\boldsymbol{y},\mathbf{X}) \] are the differences in statistics evaluated on \(\boldsymbol{y}\) when the variable \(Y_i\) is forced to be \(y_i=1\), for \(\Delta_i^+\boldsymbol{y}\), and forced to be \(y_i=0\), for \(\Delta_i^-\boldsymbol{y}\).

In other words we can think of

y ~ z1+z2+...+zp
## y ~ z1 + z2 + ... + zp

as stating that the LHS (y) should be the conditional logit, and the RHS be the “linear predictor”.

Types of effects

As further described in alaam.terms (alaam_effects.pdf), there are four classes of effects:

  1. Covariate effects
  2. Network metrics
  3. Contagion effects
  4. Interactions of monadic covariates and congation effects

Covariate effects

The effect of a covariate x on a dichotomous outcome variable y is specified as a standard formula

y ~ x
## y ~ x

Any additional effect of an additional mondatic covariate z is added +z, e.g.

y ~ x + z
## y ~ x + z

To include an interaction of two monadic covariates, the interaction symbol * is used

y ~ x + z + x*z
## y ~ x + z + x * z

The following standard formulae functions are not supported:

y ~ x + I(x^2)# create the squared term manually and add it to data frame instead
## y ~ x + I(x^2)
y ~ x + I(x>0)# create the indicator term manually and add it to data frame instead
## y ~ x + I(x > 0)
y ~ -1 +x # the intercept term cannot be removed
## y ~ -1 + x

Network effects

It is natural to include network metrics in your model. You are free to precalculate any network measures that you find useful and add them as monadic covariates. For example, you may want to add the effect of betweeness centrality or closeness centrality, clc <- closeness(adj) to your model, in which case you add clc to the data frame.

Some of network statistics that can be dervived from the network activity dependency assumption (Section 3.1.1 Network activity dependence, and also Indirect Structural Influence, Koskinen and Daraganova (2022)) are available as built in effects. For example, for a directed network, you can specify the effect of sending ties on the outcome y as

y ~ odegree
## y ~ odegree

The effects that are available from formula are:

Effect name Formula Description
degree \(y_ix_{i\cdot}=y_i\sum_j x_{ij}\) For undirected networks, this measures the association of degree centrality and the probability of success
idegree \(y_ix_{\cdot i}=y_i\sum_j x_{ji}\) For directed networks, this measures the association of in-degree centrality and the probability of success
odegree \(y_ix_{i\cdot}=y_i\sum_j x_{ij}\) For directed networks, this measures the association of out-degree centrality and the probability of success
recipties \(y_i\sum_j x_{ij}x_{ji}\) For directed networks, this measures the association of out-degree centrality and the probability of success
twostar \(y_i {\binom{x_{i\cdot}}{2}}\) For undirected networks, the effect of centrality over and above degree
intwostar \(y_i\binom{x_{\cdot i}}{2}\) For directed networks, the effect of indegree centrality over and above indegree
outtwostar \(y_i\binom{x_{i\cdot}}{2}\) For directed networks, the effect of outdegree centrality over and above outdegree
threestars \(y_i\binom{x_{i\cdot}}{3}\) For undirected networks, the effect of degree centrality over and above twostars
twopath \(y_i(x_{\cdot i}x_{i\cdot}-\sum_j x_{ij}x_{ji})\) For directed networks, the association of brokerage on the probability of success.
inthreestar \(y_i\binom{x_{\cdot i}}{3}\) For directed networks, the effect of indegree centrality over and above intwostars
outthreestar \(y_i\binom{x_{i\cdot}}{3}\) For directed networks, the effect of outdegree centrality over and above outtwostars
transties \(y_i\sum_{j,k \neq i}x_{ij}x_{ik}x_{jk}\) For (directed) undirected networks, the effect on probability of success of being embedded in (transitive) triads
indirties \(y_i\sum_j x_{ij} \sum_{k}(1-x_{ik})x_{jk}\) For (directed) undirected networks, the effect on probability of success of having ties to people that have ties to many people you are not directly tied to (see 3.1.3 Indirect network and contagion dependencies, Koskinen and Daraganova, for details)

Contagion effects

Contagion effects are parameters of statistics that capture dependence among outcomes. These can be interpreted in terms of conditional distributions, for example for “simple” or “direct contagtion” \[ {\rm{logit}}\left[\Pr(Y_i = 1 \mid \boldsymbol{y}_{-i},\boldsymbol{\theta})\right]=\theta_{DC}\sum_{j\neq i}y_jx_{ij}+c. \]

At the moment, the contagion effects that are implemented are:

Effect name Formula Description
simple \(\sum_{i,j}y_iy_jx_{ij}\) is the probability of success increased by being connected to actors whose outcome is a success
recip \(\sum_{i,j}y_iy_jx_{ij}x_{ji}\) is the probability of success increased by being mutually tied to actors whose outcome is a success (directed networks only)
indirect \(\sum_{i}y_i\sum_{j}x_{ij}\sum_{k\neq i,j}y_kx_{jk}\) is the probability of success increased by being indirectly connected to actors whose outcome is a success (see 3.1.3 Indirect network and contagion dependencies, Koskinen and Daraganova, for details))
closedind \(\sum_{i}y_i\sum_{j}x_{ij}\sum_{k\neq i,j}y_kx_{ik}x_{jk}\) is the probability of success increased by being both indirectly and directly connected to actors whose outcome is a success (see 3.1.3 Indirect network and contagion dependencies as well as supplementary material, Koskinen and Daraganova, for details))
transitive \(\sum_{i}y_i\sum_{j}x_{ij}y_{j}\sum_{k\neq i,j}y_kx_{ik}x_{jk}\) is the probability of success increased by being embedded in triads where the two other members have success on the outcome(see 3.1.3 Indirect network and contagion dependencies as well as supplementary material, Koskinen and Daraganova, for details))

Estimation examples

We will now illustrate two minimal examples of estimating models.

Markov models

For a Markov model (Robins, Pattison, and Elliott 2001), the sufficient statistics are, degrees \(x_{i\cdot}=\sum_j x_{ij}\), two-stars \(\binom{x_{i\cdot}}{2}\), three-stars \(\binom{x_{i\cdot}}{3}\), and triangles \(\sum_{j,k \neq i}x_{ij}x_{ik}x_{jk}\). These could be be pre-calculated and used as monadic covariates but we will draw on the functionality of balaam.

Independent outcomes

Assume a model for smoke, where we include the effects

  • The effect of alcohol consumption on the probability of smoking
  • The effect of doing sports on the probability of smoking
  • The effect of nominating many friends on the probability of smoking
  • The effect of being nominated by many as a friend on the probability of smoking
res.0 <- estimate.alaam(smoke ~odegree+alcohol+sport, my.data, adjacency=adj,
                           Iterations=1000)
## Network is directed with  50 nodes 
## [1] "you have p:  5"
## 
## observed stats:  17 25 43 59 11
## number of covariates:  3
## A thinning of  1 , 
## (parameter) burn-in of  1 , and 
## iterations of  1000 ,
## yields a sample size of  1000 .
## Estimation using pseudo likelihood:
## 
## (Printing every  100  iterations)
## 
## you have done  100  iterations out of  1000  
## theta: -1.977 0 0.096 0.627 -1.038
## you have done  200  iterations out of  1000  
## theta: -4.29 0 0.256 1.32 -1.478
## you have done  300  iterations out of  1000  
## theta: -6.008 0 0.866 1.13 -0.882
## you have done  400  iterations out of  1000  
## theta: -2.749 0 0.141 0.654 -0.496
## you have done  500  iterations out of  1000  
## theta: -6.201 0 0.63 1.248 -0.485
## you have done  600  iterations out of  1000  
## theta: -4.584 0 0.065 1.052 0.583
## you have done  700  iterations out of  1000  
## theta: -2.732 0 0.177 0.858 -0.675
## you have done  800  iterations out of  1000  
## theta: -3.791 0 0.247 0.995 -1.715
## you have done  900  iterations out of  1000  
## theta: -1.347 0 -0.005 0.758 -1.486
## you have done  1000  iterations out of  1000  
## theta: -1.954 0 0.159 0.327 -0.627
## summaries of the posterior draws:
##              mean      sd     ESS SACF 10 SACF 30
## intercept -3.3954  1.3685 40.9843  0.4775  0.0819
## contagion  0.0000  0.0000  0.0000                
## odegree    0.2458  0.3022 41.1220  0.4555  0.0307
## alcohol    0.9443  0.3677 39.3412  0.4535  0.1931
## sport     -1.0165  0.8287 40.0459  0.4192  0.3181
## 
## if ESS is small you need to increase the number of Iterations
## if and/or increase the parameter thinning which is currently  1 .
## Alternatively, adjust the proposals by improving the proposal variance-covariance:
## e.g. set PropSigma equal to the covariance of the posterior draws, or
## increase/decrease the scaling of the proposals (default: scaling=1/sqrt( 5 ); current:  1 /sqrt( 5 )
## Acceptance ratio:  0.484  (idealy around 0.25)

Taking the ANOVA table at face value, only alcohol has a clear non-zero parameter and effect on smoking, judging by the size of the standard deviation relative to the mean paramter.

Direct contagion

Assume that in addition to the effects in the independent model, we also want to account for social influence by including a direct contagion effect. The effect name for this is simple.

res.DC.0 <- estimate.alaam(smoke ~idegree+odegree+alcohol+sport+simple, my.data, adjacency=adj,
                           Iterations=1000)
## Network is directed with  50 nodes 
## [1] "you have p:  6"
## 
## observed stats:  17 25 40 43 59 11
## number of covariates:  4
## A thinning of  1 , 
## (parameter) burn-in of  1 , and 
## iterations of  1000 ,
## yields a sample size of  1000 .
## 
## you have done  100  iterations out of  1000  
## theta: -6.209 0.369 -0.391 0.224 1.989 -1.028
## you have done  200  iterations out of  1000  
## theta: -5.002 0.927 -1.087 0.873 1.083 -0.449
## you have done  300  iterations out of  1000  
## theta: -4.655 0.213 -0.316 0.698 0.865 -0.246
## you have done  400  iterations out of  1000  
## theta: -2.288 0.681 -0.926 0.091 1.149 -1.794
## you have done  500  iterations out of  1000  
## theta: -4.205 0.303 -1.07 1.009 1.566 -2.08
## you have done  600  iterations out of  1000  
## theta: -0.038 0.85 -0.696 0.033 0.259 -1.787
## you have done  700  iterations out of  1000  
## theta: -1.559 1.056 -0.486 -0.831 0.829 -1.008
## you have done  800  iterations out of  1000  
## theta: -1.173 0.4 -0.622 -0.029 0.853 -1.757
## you have done  900  iterations out of  1000  
## theta: -4.532 0.845 -0.87 0.228 1.017 0.263
## you have done  1000  iterations out of  1000  
## theta: -2.071 0.852 -1.333 0.304 1.154 -2.335
## summaries of the posterior draws:
##              mean      sd     ESS SACF 10 SACF 30
## intercept -3.3779  1.4755 17.7910  0.7121  0.4368
## contagion  0.6553  0.2909 26.6620  0.5746  0.0191
## idegree   -0.7546  0.3637 14.2086  0.7325  0.3522
## odegree    0.3020  0.4178 17.2616  0.7132  0.4410
## alcohol    1.0540  0.3872 24.0355  0.6307  0.2350
## sport     -1.1456  0.6902 31.4042  0.5630  0.3194
## 
## if ESS is small you need to increase the number of Iterations
## if and/or increase the parameter thinning which is currently  1 .
## Alternatively, adjust the proposals by improving the proposal variance-covariance:
## e.g. set PropSigma equal to the covariance of the posterior draws, or
## increase/decrease the scaling of the proposals (default: scaling=1/sqrt( 6 ); current:  1 /sqrt( 6 )
## Acceptance ratio:  0.286  (idealy around 0.25)

Taking the ANOVA table at face value, now it seems only the contagion parameter is clearly non-zero, judging by the size of the standard deviation relative to the mean paramter.

Posterior quantities

The output gives us an ANOVA-like table with posterior means and standard deviations. We can also get this table from the estimation object

res.0$ResTab 
##                  mean          sd         ESS     SACF 10     SACF 30
## intercept -3.39535594  1.36845791 40.98428697  0.47753938  0.08194740
## contagion  0.00000000  0.00000000  0.00000000                        
## odegree    0.24579631  0.30220180 41.12195659  0.45552884  0.03069873
## alcohol    0.94434944  0.36765084 39.34118992  0.45351163  0.19309350
## sport     -1.01649249  0.82873572 40.04592034  0.41923501  0.31813456

and a more detailed results table using

write.res.table(res.0,burnin=1,thin=1)
##   parameter   mean    sd   .025  0.975
## 1 intercept -3.395 1.368 -6.381 -0.870
## 2 contagion  0.000 0.000  0.000  0.000
## 3   odegree  0.246 0.302 -0.305  0.907
## 4   alcohol  0.944 0.368  0.319  1.809
## 5     sport -1.016 0.829 -2.599  0.500

If we were to create a 95% Credibility interval for the parameter of alcohol, this would not include 0. The parameter is positive with high posterior probability.

Explaining the summaries

The summaries in the table are simply summary statistics for the full \(p\)-dimensional posterior. For example,

mean(res.0$Thetas[,1])
## [1] -3.395356
sd(res.0$Thetas[,1])
## [1] 1.368458

The full distribution can be plotted manually

hist(res.0$Thetas[,1])

And bivariate plots can also be made manually

plot(res.0$Thetas[,c('intercept')],res.0$Thetas[,c('odegree')])

Graphing posteriors

In balaam the function plotPost, produces histograms, trace plots, and autocorrelation plots for all parameters in your model.

plotPost(res.0,figname='markov 0',showplot=TRUE)
## Loading required package: xtable
## Parameter  2  all zeros

We see that the posterior distribution for alcohol is concentrated to the right of 0.

MCMC performance

Explaining MCMC

The MCMC algortihm generates a sequence

\[ \boldsymbol{\theta}_0,\boldsymbol{\theta}_1,\ldots,\boldsymbol{\theta}_T \]

of \(T\) paramter draws from the posterior \(\pi(\boldsymbol{\theta} \mid \mathbf{y})\). The draws are made by proposing a new value \(\boldsymbol{\theta}^{\ast}\) given a current value \(\boldsymbol{\theta}_t\) in iteration \(t\) \[ \boldsymbol{\theta}^{\ast} \mid \boldsymbol{\theta}_t \thicksim \mathcal{N}_p( \boldsymbol{\theta}_t , \boldsymbol{\Sigma}). \]

This new value is either accepted, and \(\boldsymbol{\theta}_{t+1}\) is set to \(\boldsymbol{\theta}^{\ast}\), or rejected, in which case \(\boldsymbol{\theta}_{t+1}\) is set to \(\boldsymbol{\theta}_{t}\).

Trace plots

We can plot the sequence of updates \(\boldsymbol{\theta}_0,\ldots,\boldsymbol{\theta}_T\), to get a sense of whether updates are large or small, and if many or few proposed values \(\boldsymbol{\theta}^{\ast}\) are accepted

plot( ts( res.0$Thetas) )

What we are looking for in the trace plots are

  1. Is there too much dependence on the initial value \(\boldsymbol{\theta}_0\) - is there a trend?
  2. Are the changes too small?
  3. Are too many/few updates accepted

Diagnosing MCMC

Initial values

The default is to set \(\boldsymbol{\theta}_0\) to the maximum likelihood estimate (MLE) for an independent model. These starting values should be sufficiently good for most models. We can however look at the contagion parameter for the DC model, for which the parameter is intialised in \(\theta_{DC}=0\)

plot( ts( res.DC.0$Thetas[,2]), ylab=colnames(res.DC.0$Thetas)[2], xlab='Iteration' )
lines( cumsum(res.DC.0$Thetas[,2])/c(1:1000), col='red')

> There is somehing of an increasing trend

Autocorrelation

Since the MCMC is iterative, the chains could

  • could stay in one place for a number of iterations, or
  • make very small updates, if \(\boldsymbol{\theta}^{\ast}\) are close to \(\boldsymbol{\theta}_t\)

This would mean that values \(\boldsymbol{\theta}_{t}\) and \(\boldsymbol{\theta}_{s}\), for iterations \(s\) and \(t\) that are close to each other, are likely to be more similar, more correlated, than for iterations \(s\) and \(t\) that are far appart. This is the first sources of serial autocorrelation in the chains. The second source, relates to how big jumps we propose, i.e. how close is \(\boldsymbol{\theta}^{\ast}\) to the current value \(\boldsymbol{\theta}_t\) in iteration \(t\)? If we make too small jumps, values or iterations \(s\) and \(t\) that are close to each other will be highly correlated.

A perfect sampler would propose and accept \(\boldsymbol{\theta}^{\ast}\) regadeless of where we currently are in iteration \(t\). If this were the case, then the effective sammple size would be equal to the total number of iterations. As a ficticious example, consider drawing 100 normal variates

# WHITE NOISE
theta.hypothetical <- rnorm(100, mean =1, sd=1.5)
par( mfrow= c(1,2) )
plot(theta.hypothetical,type='l')
hist(theta.hypothetical)
abline(v=1)

The draws here randomly fluctuate areound the mean (1.5), and if we project the draws to a histogram, this gives us the sample from our target distribution.

For ALAAM posteriors we have three ways of checking the autocorrelation.

SACF

The results table provide the sample autocorrelation between draws \(\boldsymbol{\theta}_{t}\) and \(\boldsymbol{\theta}_{t+k}\), for lags \(k=10\) and \(k=30\).

res.DC.0$ResTab
##                  mean          sd         ESS     SACF 10     SACF 30
## intercept -3.37793664  1.47549094 17.79100523  0.71210864  0.43679635
## contagion  0.65526451  0.29086453 26.66197009  0.57455578  0.01912824
## idegree   -0.75457907  0.36366779 14.20860936  0.73246188  0.35222317
## odegree    0.30203517  0.41782726 17.26160145  0.71316465  0.44102787
## alcohol    1.05395604  0.38718390 24.03548280  0.63071328  0.23496987
## sport     -1.14555868  0.69017863 31.40419919  0.56299232  0.31941782

plotPost

The plotting function gives us the correlations for all lags.

ACF plot

We can use the standard function acf, for example for the contagion parameter

acf( res.DC.0$Thetas[,2] )

We can compare this with the white noise, where we have 100 independent draws.

acf(theta.hypothetical)

as.numeric(acf(theta.hypothetical,plot=FALSE)[c(5,10)][[1]])
## [1]  0.02049821 -0.11053814

With high autocorrelation we need more iterations to get a “representative” sample from the posterior

Intuitively, with high autocorrelation

Your posterior will look very different depending on what subset of iterations you look at!

NOTE The log-run behaviout of the Monte carlo mean \[ \bar{\boldsymbol{\theta}}=\frac{1}{T}\sum_{t=0}^T\boldsymbol{\theta}_t \]

will not be affected by high autocorrelation. The long-run behaviour (and quality) of the Monte Carlo estimate of the variance-covariance matrix \[ \bar{V}(\boldsymbol{\theta})=\frac{1}{T}\sum_{t=0}^T\boldsymbol{\theta}_t\boldsymbol{\theta}_t^{\top}-\bar{\boldsymbol{\theta}}\bar{\boldsymbol{\theta}}^{ \top} \]

and posterior credibility intervals will be (potentially) severely affected.

ESS

The effective sample size (ESS), is an estimate of how many independent draws that you have in your draws. For the white noise, we have 100 independent draws. The effective (independent) sample size is equal to the number of draws.

effectiveSize(theta.hypothetical)
## var1 
##  100

For the DC model, we did 1000 iteration and the reported ESS was around 30 for most parameters. This means that we need to do roughly

1000/30
## [1] 33.33333

updates in the MCMC for every approximately independent draw.

Improving MCMC

  1. To reduce dependence on the initial value, we can start in a better \(\boldsymbol{\theta}_0\)
  2. To reduce SACF, we can “space out” our sample by only using every \(k\)th iteration - thinning
  3. To reduce SACF, we can optimise how large updates \(\boldsymbol{\theta}^{\ast}\) we do to \(\boldsymbol{\theta}_t\), by calibrating \(\boldsymbol{\Sigma}\) in the proposal distribution.

Using prevBayes

We can continue the estimation where the previous estimation finished using prevBayes. We can also set par.burnin and thinning to reduce the SACF. With 10100 iterations, burning of 100, and thinning or 10, we will get a total sample size of (5100-100)/10=500.

# Previous call for reference
#res.DC.0 <- estimate.alaam(smoke ~idegree+odegree+alcohol+sport+simple, my.data, adjacency=adj,
#                           Iterations=1000)
res.DC.1 <- estimate.alaam(smoke ~idegree+odegree+alcohol+sport+simple, my.data, adjacency=adj,
                           Iterations=5100,
                        prevBayes=res.DC.0,# our first estimation
                        par.burnin=100,# discard the first 100 iterations
                        thinning=10)# only use every 
## [1] "you have p:  6"
## 
## observed stats:  17 25 40 43 59 11
## number of covariates:  4
## A thinning of  10 , 
## (parameter) burn-in of  100 , and 
## iterations of  5100 ,
## yields a sample size of  501 .
## 
## you have done  510  iterations out of  5100  
## theta: -7.233 0.333 -0.751 0.91 1.976 -1.758
## you have done  1020  iterations out of  5100  
## theta: -4.295 0.555 -1.393 1.601 1 -1.318
## you have done  1530  iterations out of  5100  
## theta: -2.635 0.234 -0.994 0.484 0.991 -0.453
## you have done  2040  iterations out of  5100  
## theta: -2.003 0.795 -0.668 -0.127 0.942 -1.607
## you have done  2550  iterations out of  5100  
## theta: -4.806 0.524 -0.959 0.669 1.296 -0.676
## you have done  3060  iterations out of  5100  
## theta: -1.365 0.145 -0.621 0.082 0.954 -1.213
## you have done  3570  iterations out of  5100  
## theta: -6.175 0.668 -0.187 0.197 1.206 -0.02
## you have done  4080  iterations out of  5100  
## theta: -5.843 1.467 -1.313 1.268 1.076 -1.538
## you have done  4590  iterations out of  5100  
## theta: -5.483 1.076 -1.657 0.902 1.639 -1.88
## you have done  5100  iterations out of  5100  
## theta: -2.9 0.862 -0.434 -0.032 0.693 -1.024
## summaries of the posterior draws:
##               mean       sd      ESS  SACF 10 SACF 30
## intercept  -3.7220   1.7260  63.5302   0.1125        
## contagion   0.6272   0.3020  86.9372   0.0482        
## idegree    -0.7739   0.3486  53.7528   0.1768        
## odegree     0.5038   0.4326  55.3393   0.1267        
## alcohol     1.0246   0.4453  83.6292   0.0903        
## sport      -1.0839   0.8117 108.1557   0.0416        
## 
## if ESS is small you need to increase the number of Iterations
## if and/or increase the parameter thinning which is currently  10 .
## Alternatively, adjust the proposals by improving the proposal variance-covariance:
## e.g. set PropSigma equal to the covariance of the posterior draws, or
## increase/decrease the scaling of the proposals (default: scaling=1/sqrt( 6 ); current:  1 /sqrt( 6 )
## Acceptance ratio:  0.4896078  (idealy around 0.25)

Draws 10 iterations apart are now very close to independent, judging by the SACF. ESS are close to 100.

The figures look promising but the proportion of accepted proposals (acceptance ratio) is high. This suggests that chains might be making too small moves. Check trace plots

plot( ts( res.DC.1$Thetas ))

Calibrate proposal

If we set recalibrate equal to TRUE, we will estimate \(\hat{\boldsymbol{\Sigma}}=V(\boldsymbol{\theta} \mid \boldsymbol{y})\), and propose moves from \[ \boldsymbol{\theta}^{\ast} \mid \boldsymbol{\theta}_t \thicksim \mathcal{N}_p( \boldsymbol{\theta}_t , \tfrac{c}{\sqrt{p}}\hat{\boldsymbol{\Sigma}}) \] The tuning constant \(c\) is given by the argument scaling.

res.DC.2 <- estimate.alaam(smoke ~idegree+odegree+alcohol+sport+simple, my.data, adjacency=adj,
                           Iterations=5100,
                           prevBayes=res.DC.0,# our first estimation
                           par.burnin=100,# discard the first 100 iterations
                           thinning=10,# only use every 10
                           recalibrate=TRUE,# use proposal variance from previous posterior
                           scaling = 0.55)# scale down
## [1] "you have p:  6"
## 
## observed stats:  17 25 40 43 59 11
## number of covariates:  4
## A thinning of  10 , 
## (parameter) burn-in of  100 , and 
## iterations of  5100 ,
## yields a sample size of  501 .
## 
## you have done  510  iterations out of  5100  
## theta: -3.1 0.343 -0.665 0.355 1.131 -1.491
## you have done  1020  iterations out of  5100  
## theta: -1.043 0.417 -0.43 -0.26 0.933 -2.581
## you have done  1530  iterations out of  5100  
## theta: -1.769 0.718 -1.303 0.342 1.043 -1.713
## you have done  2040  iterations out of  5100  
## theta: -3.569 0.11 -1.452 0.776 2.123 -2.971
## you have done  2550  iterations out of  5100  
## theta: -4.135 0.584 -0.06 0.179 0.603 0.061
## you have done  3060  iterations out of  5100  
## theta: -5.852 0.386 -0.518 0.204 1.409 1.009
## you have done  3570  iterations out of  5100  
## theta: -2.482 0.761 -0.711 0.503 0.759 -1.937
## you have done  4080  iterations out of  5100  
## theta: -4.629 0.245 -1.041 1.383 1.153 -0.677
## you have done  4590  iterations out of  5100  
## theta: -6.768 0.664 -0.978 0.613 1.932 -0.953
## you have done  5100  iterations out of  5100  
## theta: -3.612 0.926 -1.095 0.58 0.831 -0.101
## summaries of the posterior draws:
##              mean      sd     ESS SACF 10 SACF 30
## intercept -3.6637  1.4545 61.0515  0.1076        
## contagion  0.5890  0.3228 45.3047  0.2180        
## idegree   -0.7629  0.4072 43.1752  0.0483        
## odegree    0.4498  0.3832 46.6506  0.2298        
## alcohol    1.0706  0.4167 70.8577  0.1980        
## sport     -0.9663  0.9692 25.2831  0.2912        
## 
## if ESS is small you need to increase the number of Iterations
## if and/or increase the parameter thinning which is currently  10 .
## Alternatively, adjust the proposals by improving the proposal variance-covariance:
## e.g. set PropSigma equal to the covariance of the posterior draws, or
## increase/decrease the scaling of the proposals (default: scaling=1/sqrt( 6 ); current:  0.55 /sqrt( 6 )
## Acceptance ratio:  0.04588235  (idealy around 0.25)
plot( ts( res.DC.2$Thetas ))

Chains seem to make big moves but very few proposals are accepted.

Possible remedies include 1. Reduce scaling 2. Buy mixing by increasing iterations and thinning 3. Recalibrate yet another time

What is a good ESS

That the chains are not trending and that autocorrelation is not too high more imporant that a single number. Conceptually, if you have an ESS of, say, 10, then you will only have precision down to the first decimal. With ESS of 1000, you can have precision down to maybe the second or third decimal.

More on the algoritm

When determining whether to accept \(\boldsymbol{\theta}^{\ast}\), a replicate dataset \(\boldsymbol{y}^{\ast}\), is drawn from the model \[ \boldsymbol{y}^{\ast} \thicksim p_{\boldsymbol{\theta}^{\ast}}(\boldsymbol{y} | \mathbf{X} ) \] The parameter \(\boldsymbol{\theta}^{\ast}\) is accepted into the posterior is \(\boldsymbol{y}^{\ast}\) sufficiently similar to observed data \(\boldsymbol{y}\). More specifically, the parameter is accepted with probability \[ \min \left\{1,e^h \right\} \] where \[ h = (\boldsymbol{\theta}_t-\boldsymbol{\theta}^{\ast})^{\top}(\boldsymbol{z}(\boldsymbol{y}^{\ast},\mathbf{X})-\boldsymbol{z}(\boldsymbol{y},\mathbf{X})) \]

We cannot draw \(\boldsymbol{y}^{\ast}\) directly from the model, but will have to rely on MCMC. This algorithm is similar to how we generate \(\boldsymbol{\theta}\) iteratively, a sequence \[ \boldsymbol{y}^{\ast}_0,\boldsymbol{y}^{\ast}_1,\ldots,\boldsymbol{y}^{\ast}_M \] is generated and only the last iteration is used. The number \(M\) of iterations we use in order to get one draw, is set by the argument burnin. This number should be at least \[ M > 0.25 \times n \times \kappa \]

for \(\kappa\) greater than 30.

GOF

To appraise how well the estimate model replicates data, the goodness-of-fit (GOF) procedure simulates replicate data \[ \boldsymbol{y}_t^{(rep)}\thicksim p_{\boldsymbol{\theta}_t}(\boldsymbol{y}^{(rep)} | \mathbf{X} ) \] for the each of the parameter dras \(\boldsymbol{\theta}_t\) in our posterior sample. Fit of the replicate data to observed data is then judged by comparing \(\boldsymbol{y}_t^{(rep)}\) to \(\boldsymbol{y}\) on a number of metrics \(S_k:\mathcal{Y}\times\mathcal{X} \rightarrow \mathbb{R}\). The posterior \(p\)-value is defined as \[ \mathbb{E}[|S_k(\boldsymbol{y}^{(rep)})-\mathbb{E}(S_k(\boldsymbol{y}^{(rep)}))|>|S_k(\boldsymbol{y})-\mathbb{E}(S_k(\boldsymbol{y}^{(rep)}))|] \] ### GOF Statistics

The pre-programmed statistics \(S_k\) are

GOF-name interpretation statistic
intercept intercept \(\sum y_{i}\)
simple cont. direct contagion through outgoing ties \(\sum y_{i}y_{j}x_{i,j}\)
recip cont. contagion through reciprochated ties \(\sum y_{i}y_{j}x_{i,j}x_{j,i}\)
indirect cont. indirect contagion \(\sum_{j,k}y_ix_{i,j}x_{j,k}y_k\)
closedind cont. contaigion in closed triad \(\sum_{j,k}y_ix_{i,j}x_{j,k}x_{i,k}y_k\)
transitive cont. contagion in transitive triple \(\sum_{j,k}x_{i,j}x_{j,k}x_{i,k}y_iy_jy_k\)
outdegree Markov outdegree \(\sum y_{i}\sum_j x_{i,j}\)
indegree Markov outdegree \(\sum y_{i}\sum_j x_{j,i}\)
reciprochation Markov reciprochal ties \(\sum y_{i}\sum_j x_{i,j}x_{i,j}\)
instar Markov in-star \(\sum y_{i} {\binom{\sum_j x_{i,j}}{2}}\)
outstar Markov out-star \(\sum y_{i} {\binom{\sum_j x_{j,i}}{2}}\)
twopath Markov mixed star \(\sum y_{i} \sum x_{i,j}x_{i,k}\)
in3star Markov in-three star \(\sum y_{i} \sum x_{j,i}x_{k,i}x_{h,i}\)
out3star Markov out-three star \(\sum y_{i} \sum x_{i,j}x_{i,k}x_{i,h}\)
transitive Markov transitive triangle \(\sum y_i \sum_{j,k}x_{i,j}x_{j,k}x_{i,k}\)
cyclic Markov cyclic triangle \(\sum y_i \sum_{j,k}x_{i,j}x_{j,k}x_{k,i}\)
indirect Markov indirect, non-exclusive ties \(\sum_{j} (x_{i,j} x_{j, +} - x_{i,j}x_{j,i})\)
excl.indirect Markov indirect, unique nodes \(\sharp \{ k : x_{ik}=0,\max_j(x_{i,j}x_{j,k})>0 \}\)
prop.alc.alter a user-defined alter attribute variable \(\frac{1}{1+x_{i,+}} \sum x_{i,j}a_{j}\)

Generate GOF distribution

To generate a sample from the model implied by the independent model

sim.0 <- get.gof.distribution(NumIterations=100, # number of vectors to draw
                                  res=res.0, # the ALAAM estimation object that contains model and results
                                  burnin=100, # no. iteractions discarded from GOF distribution
                                  contagion ='none') # should be the same as for model fitted
## Simulating GOF took  0.9588392 
## Calculating statistics took  0.118017

Calculate the statistics

gof.table(obs.stats=    sim.0$stats, # observed statistics included  not fitted statistics
          sim.stats=    sim.0$Sav.gof, # simulated goodness-of-fit statistics
          name.vec= sim.0$gof.stats.names, # names of statistics calculate, not all will be used if undirected
          tabname='ALAAMGofalt', # name of file saved
          pvalues=TRUE, # posterior predictive p-values
          save.tab ='csv' # save a csv file or a LaTex file
         )
##                      obs    mean   p-val
## intercept         17.000  16.820   0.200
## simple cont.      25.000  20.200   0.130
## recip cont.        7.000   6.550   0.180
## indirect cont.    59.000  38.450   0.055
## closedind cont.   22.000  18.720   0.160
## transitive cont.  13.000  11.450   0.205
## outdegree         43.000  42.830   0.220
## indegree          40.000  46.150   0.115
## reciprochation    26.000  29.280   0.120
## instar            57.000  68.850   0.115
## outstar           44.000  42.720   0.195
## twopath           88.000 101.900   0.125
## in3star           71.000  76.630   0.105
## out3star          22.000  20.940   0.190
## transitive        34.000  34.550   0.220
## cyclic            18.000  25.290   0.035
## indirect          93.000  82.700   0.135
## excl.indirect     53.000  43.860   0.110

The independent model struggles to replicate the contagion effects

Repeat for the simple contagion model

sim.2 <- get.gof.distribution(NumIterations=100, # number of vectors to draw
                                  res=res.DC.2, # the ALAAM estimation object that contains model and results
                              thinning= 5000,# number of iterations to draw y
                                  burnin=100, # no. iteractions discarded from GOF distribution
                                  contagion ='simple') # should be the same as for model fitted
## Simulating GOF took  4.995309 
## Calculating statistics took  0.04985404
gof.table(obs.stats=    sim.2$stats, # observed statistics included  not fitted statistics
          sim.stats=    sim.2$Sav.gof, # simulated goodness-of-fit statistics
          name.vec= sim.2$gof.stats.names, # names of statistics calculate, not all will be used if undirected
          tabname='ALAAMGofalt', # name of file saved
          pvalues=TRUE, # posterior predictive p-values
          save.tab ='csv', # save a csv file or a LaTex file
          directed=FALSE)
##                     obs   mean  p-val
## intercept        17.000 16.080  0.215
## simple cont.     25.000 19.780  0.115
## recip cont.       7.000  6.960  0.205
## indirect cont.   59.000 33.520  0.045
## closedind cont.  22.000 16.230  0.105
## transitive cont. 13.000 10.330  0.165
## degree           43.000 40.830  0.220
## reciprochation   26.000 25.840  0.230
## instar           57.000 39.780  0.085
## outstar          44.000 42.430  0.255
## twopath          88.000 76.910  0.170
## in3star          71.000 30.920  0.065
## out3star         22.000 23.400  0.200
## triadic          34.000 35.030  0.220
## cyclic           18.000 20.710  0.150
## indirect         93.000 83.680  0.205
## excl.indirect    53.000 44.170  0.135

No indications of poor fit

Model selection

GOF only checks if there are features of data that are not adequately captured by the model. We cannot say how much better or worse the GOF for one model is compared to another - either the model fits or it does not (and we do not want to overfit).

To compare models we may quantify the fit of a model, drawing by analogy with standard generalised linear models.

References

Butts, Carter T. 2015. _Network: Classes for Relational Data_. The Statnet Project. https://CRAN.R-project.org/package=network.
———. 2016. _Sna: Tools for Social Network Analysis. https://CRAN.R-project.org/package=sna.
Daraganova, Galina. 2009. Statistical Models for Social Networks and Network-Mediated Social Influence Processes: Theory and Applications. University of Melbourne.
Daraganova, Galina, and Garry Robins. 2013. “Autologistic Actor Attribute Models.” In Exponential Random Graph Models for Social Networks: Theory, Methods and Applications, edited by Lusher D., Koskinen J., and G. Robins, 102–14. NY: Cambridge University Press New York.
Koskinen, Johan, and Galina Daraganova. 2022. “Bayesian Analysis of Social Influence.” Journal of Royal Statistical Society, Series A 185 (4): 1855–81.
Leenders, Roger Th AJ. 2002. “Modeling Social Influence Through Network Autocorrelation: Constructing the Weight Matrix.” Social Networks 24 (1): 21–47.
Marsden, Peter V, and Noah E Friedkin. 1993. “Network Studies of Social Influence.” Sociological Methods & Research 22 (1): 127–51.
Robins, Garry, Philippa Pattison, and Peter Elliott. 2001. “Network Models for Social Influence Processes.” Psychometrika 66 (2): 161–89.